-
Notifications
You must be signed in to change notification settings - Fork 239
Node Support for Aura -> Babe Runtime Upgrades #1927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devnet-ready
Are you sure you want to change the base?
Conversation
7d8246c
to
483729e
Compare
eaba58d
to
5d2d0e2
Compare
update Cargo.lock
Hi @shamil-gadelshin, I have addressed your comments and also re-written the "Update Aug 26" section of the PR to be more in-depth and include technical reasoning. Please let me know if that is more clear, and you have any further questions. As always, appreciate your time and help reviewing this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shamil-gadelshin, I have addressed your comments and also re-written the "Update Aug 26" section of the PR to be more in-depth and include technical reasoning.
Please let me know if that is more clear, and you have any further questions.
As always, appreciate your time and help reviewing this PR
Thank you! I was finally able to understand the warp sync issues and some of your decisions.
I have a couple of minor issues left (crate names and "triggered" variable). Otherwise, looks good.
Cargo.toml
Outdated
pallet-subtensor-swap-rpc = { default-features = false, path = "pallets/swap/rpc" } | ||
node-subtensor-runtime = { path = "runtime", default-features = false } | ||
pallet-admin-utils = { path = "pallets/admin-utils", default-features = false } | ||
pallet-collective-otf = { path = "pallets/collective", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gztensor What do you suggest?
…esParams" This reverts commit 00d5003.
9d0ea5d
to
914030d
Compare
After the conversation with @gztensor we confirmed that we use "-subtensor-" for such cases. @liamaharon , please, rename the crates when you have time. |
3a7e6d4
to
8f2b47e
Compare
Thanks @gregzaitsev and @shamil-gadelshin, I've replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Consider merging the latest devnet-ready
to counter conflicts created by the benchmarking bot.
Thanks @shamil-gadelshin , I have merged devnet-ready and also updated the Polkadot SDK fork to include your |
just some more conflicts, btw will prob want to hold off on deploying this until next week because we are doing 2 quick deploys this week that need to be low-risk |
25fd226
to
c93bebd
Compare
Phase 2 in #1887
Phase 2 additions:
-otf
suffix to subtensor crates that share the same name as a polkadot-sdk crate. This is to prevent name collisions when applying patches, and ensure we don't accidentally use the wrong crate.Update Aug 26
I discovered there is an issue warp syncing a chain that contains an Aura to Babe migration. The warp sync would succeed up until the first Babe block is mined, and then revert to a regular sync.
Root Cause
Polkadot SDK does not allow starting a service with warp sync if the database is in a partially synced state: https://github.com/opentensor/polkadot-sdk/blob/d13f915d8a1f55af53fd51fdb4544c47badddc7e/substrate/client/network/sync/src/strategy/warp.rs#L235-L252
In the initial implementation of this PR, while a node is warp syncing, it will restart part-way through the sync when it detects the first Babe block.
At the time of the service restart, the node has a partially synced database. This causes the previously referenced code to terminate the warp sync, and fall back to a regular sync.
Solution
To resolve this issue, we must warp sync the entire chain (Aura AND Babe blocks) without restarting the service.
This is achieved in two parts:
First, I prevented the Aura service switching to a Babe service while the node is syncing. This was achieved by adding new code here:
subtensor/node/src/consensus/aura_consensus.rs
Line 222 in a234096
Second, I added support for the node to import Babe blocks while running an Aura service. This was achieved by replacing the
AuraWrappedImportQueue
with aHybridImportQueue
that containsHybridBlockImport
that contains inner full implementations ofAuraBlockImport
andBabeBlockImport
HybridVerifier
that contains inner full implementations of theAuraVerifier
andBabeVerifier
import_queue
function that builds anImportQueue
implementation capable of completely importing both Aura and Babe blocks.The Aura service is required to construct a
BabeConfiguration
to pass to the hybridimport_queue
, so it can import the first Babe block it encounters. This required me to pull in some Babe runtime configuration from #1708 into this PR, specifically theBABE_GENESIS_EPOCH_CONFIG
andEPOCH_DURATION_IN_BLOCKS
.With these runtime constants, we are able to construct what our initial Babe configuration will be while running an Aura service:
subtensor/node/src/consensus/aura_consensus.rs
Lines 249 to 285 in a234096
Summary
With the two changes described in "Solution", when a node is warp syncing, it will warp sync the entire chain (all Aura and Babe blocks) and import the state entirely before it switches to running a Babe service. This resolves the issue root cause of the issue, which is that the node would restart mid-warp-sync.
It is important this phase is merged prior to phase 3, so node operators have time to upgrade in advance of the runtime upgrade.
Steps to simulate Aura -> Babe migration with finney state
$ git checkout node-decentralization $ cargo b -r -p node-subtensor && cp ./target/release/wbuild/node-subtensor-runtime/node_subtensor_runtime.compact.compressed.wasm ./babe-npos.wasm
$ git checkout hybrid-node $ rm ./target/release/node-subtensor && cargo b -r -p node-subtensor
$ cd ../baedeker-for-subtensor $ ./localnet-baedeker.sh
QA Checklist
--initial-consensus aura
gracefully switches to Babe post-upgrade--initial-consensus babe
gracefully switches to Aura pre-upgradedevnet-ready
state/runtimehybrid-node
state/runtimebaedeker-finney
state/runtimedevnet-ready
state/runtimehybrid-node
state/runtimedevnet-ready
state/runtimehybrid-node
state/runtimebaedeker-finney
state/runtimedevnet-ready
hybrid-node
devnet-ready
state/runtimehybrid-node
state/runtimebaedeker-finney
state/runtime